Genome Medicine
○ Springer Science and Business Media LLC
Preprints posted in the last 90 days, ranked by how well they match Genome Medicine's content profile, based on 154 papers previously published here. The average preprint has a 0.32% match score for this journal, so anything above that is already an above-average fit.
Yepez, V. A.; Luknarova, R.; Beijer, D.; Estevez-Arias, B.; Mei, D.; Morsy, H.; Mueller, J. S.; Polavarapu, K.; Demidov, G.; Doornbos, C.; Ellwanger, K.; Krass, L.; Laurie, S.; Matalonga, L.; Abdelrazek, I. M.; Astuti, G.; Bisulli, F.; Brechtmann, F.; Dabad, M.; Denomme Pichon, A. S.; Drakos, M.; Eddafir, Z.; Garrabou, G.; Guerrini, R.; Johari, M.; Kegele, J.; Kilicarslan, O. A.; Koelbel, H.; Kolen, I. H. M.; Licchetta, L.; Lochmueller, H.; Maassen, K.; Macken, W.; Mertes, C.; Milisenda, J. C.; Minardi, R.; Mostacci, B.; Neveling, K.; Oud, M. M.; Park, J.; Pujol, A.; Roos, A.; Sagath, L.; van
Show abstract
RNA sequencing (RNA-seq) provides a powerful complement to DNA sequencing for uncovering pathogenic defects affecting gene expression and splicing in individuals with genetically undiagnosed rare disorders. However, as large rare disease consortia adopt RNA-seq, challenges arise due to cohort heterogeneity, variability in tissues and sample sizes, and differences in interpretation practices. Here, we present a harmonized analytical and interpretation framework developed by the pan-European Solve-RD consortium to address these challenges. We analyzed 521 RNA-seq samples from whole blood, fibroblasts, muscle and peripheral blood mononuclear cells collected across more than 30 clinics and five European Reference Networks. Aberrant expression and splicing events were identified using OUTRIDER and FRASER 2.0 and analysed through a standardized four-level scoring framework that encompassed RNA-seq outlier reliability, phenotype relevance, variant mechanism, and segregation evidence, captured in structured reports for interpretation. Regular meetings, and collaborative "Solvathon" workshops were used to evaluate variant pathogenicity. This effort resulted in molecular diagnoses for 19 families out of 248 (7.7%) for whom DNA analyses had been inconclusive. Furthermore, three cases diagnosed using DNA analyses were confirmed, and 49 candidate events and five novel candidate disease genes were identified in the remaining families. Our results demonstrate the feasibility and impact of large-scale, standardized RNA-seq analysis in a transnational research setting. This framework provides a model for other international initiatives such as the Undiagnosed Diseases Network and ERDERA, paving the way for broader clinical implementation of transcriptome-based rare disease diagnostics.
Prentice, A. J.; McSalley, I.; Magielski, J. H.; Mercurio, J.; Tefft, S.; Winters, A.; Kaufman, M. C.; Ruggiero, S. M.; McGarry, L. M.; Hood, V.; McKee, J. L.; Goldberg, E. M.; Helbig, I.
Show abstract
SCN1A-related disorders are the single most common monogenic cause of epilepsy and represent a major focus of precision medicine efforts. In conjunction with existing prospective studies, the analysis of real-world data obtained during routine clinical care can expand upon the scale and duration of available data and contribute to the development of meaningful outcomes for clinical trials. Here, we leveraged real-world data to delineate the longitudinal disease history of 100 individuals with SCN1A-related disorders using a systematic approach. We mapped a total of 671 unique clinical terms to a standardized framework in monthly increments across 681 patient-years, including 75 terms related to seizure types. Within this cohort, 89 individuals had presumed loss-of-function variants in SCN1A based on variant type and clinical diagnosis, including those with Dravet syndrome (N = 79) and genetic epilepsy with febrile seizures plus (N = 10). Ten individuals had a non-Dravet developmental and epileptic encephalopathy caused by gain-of-function variants in SCN1A. By annotating seizure type and frequency in monthly time-bins, we assessed seizure burden. A median of 17 changes in seizure frequency and ten terms referring to seizure type were identified per participant. Myoclonic seizures occurred with high frequency (median >5 daily), whereas hemiclonic, focal impaired consciousness, and bilateral tonic-clonic seizures occurred more rarely (median monthly). Retrospective analysis of developmental histories showed a range of cognitive abilities. Neurodevelopmental differences were observed in 83% (83/100) of individuals, of whom 83% (69/83) demonstrated delayed language skills. Motor coordination impairments, including gait disturbance, ataxia, hypotonia, and imbalance were annotated in 69% (69/100) of participants. EEG findings varied with age; most were reported as normal before nine months of age, after which the prevalence of abnormal interictal findings increased. Individuals with different clinical syndromes had unique medication landscapes, with 554 prescriptions of 37 unique therapies. Changes in treatment coincided with the diagnosis of an SCN1A-related disorder, with an increase in cannabidiol, clobazam, and fenfluramine and reduction in sodium channel-blocker use following genetic diagnosis. In summary, we reconstructed the longitudinal disease history of SCN1A-related disorders from electronic medical records using a standardized framework for the analysis of real-world clinical data. We refine existing natural history data of SCN1A-related disorders by providing a granular landscape of seizures, comorbidities, and treatment approaches over time.
Brünger, T.; Krey, I.; Kim, S.; Klöckner, C.; Myers, S. J. A.; Johannesen, K. M.; Stefanski, A.; Taylor, G.; Perez-Palma, E.; Macnee, M.; Schorge, S.; Dahl, R. S.; Yuan, H.; Perszyk, R. E.; Kim, S.; Bajaj, S.; Helbig, I.; Pan, J. Q.; Farrant, M.; Wollmuth, L.; Wyllie, D. J. A.; Kurganov, E.; Baez, D.; Zuberi, S.; Bosselmann, C. M.; Lerche, H.; Mantegazza, M.; Cestele, S.; May, P.; Ivaniuk, A.; Meskis, M. A.; Hood, V.; Schust, L.; Goodspeed, K.; Kang, J.-Q.; Freed, A.; Gati, C.; Montanucci, L.; Wuster, A.; Trinidad, M.; Froelich, S.; Deng, A. T.; Aledo-Serrano, A.; Borovikov, A.; Sharkov, A.;
Show abstract
Rare Mendelian disorders affect 300-400 million people globally. Although genetic testing has become widely adopted, gene-specific evidence for tailored variant interpretation remains scattered across resources. We present Gene Portals, a framework for gene-centered multimodal knowledge bases that co-localize expert-harmonized clinical data, functional assays, population variation, structural annotations and gene-specific ACMG/AMP specifications within a single resource. A modular interface integrates this unified evidence with VCEP-refined ACMG specifications to enable automated gene-specific variant classification, infer molecular mechanisms, and support cross-gene analyses. We demonstrate the frameworks utility across five Gene portals spanning eleven neurodevelopmental disorder-associated genes, integrating data from 4,423 individuals with 2,838 unique variants, 36,149 ClinVar submissions, and 1,044 expert-curated molecular readouts. By organizing evidence that is otherwise dispersed across multiple sources into a unified, queryable framework, the SCN, GRIN, CACNA1A, SATB2 and SLC6A1 Gene Portals became widely used community resources and provide an extensible template for standardized rare-disease variant interpretation and mechanism-aware discovery.
Shang, Y.; Badonyi, M.; Marsh, J. A.
Show abstract
Interpreting the clinical significance of missense variants of uncertain significance (VUS) remains a major challenge in clinical genetics. Although computational variant effect predictors (VEPs) and multiplexed assays of variant effect (MAVEs) can generate large-scale functional scores, their value is typically assessed using discrimination metrics such as AUROC rather than by the strength of evidence they provide under ACMG/AMP guidelines. Here, we introduce mean evidence strength (MES), a quantitative metric that summarises the pathogenic and benign evidence assigned across missense variants following gene-level Bayesian calibration. Using the acmgscaler framework, we calibrated 12 population-free VEPs across 367 disease genes and analysed 15 MAVE datasets with sufficient clinical data. MES revealed important discrepancies with AUROC, including cases where methods with similar discrimination differed substantially in evidence yield. MAVEs achieved high average MES despite lower AUROC, while several VEPs showed strong discrimination but more limited calibrated evidence. Among predictors, CPT-1 achieved the highest MES and provided moderate or stronger evidence for the largest fraction of ClinVar VUS. MES therefore provides a practical framework for evaluating computational and experimental variant effect datasets in terms of calibrated clinical evidence yield.
Eisenhart, C. E.; Brickey, R.; Mewton, J.
Show abstract
The Clinical Pharmacogenetics Implementation Consortium (CPIC) bases its drug-gene recommendations on the assignment of star alleles, which map known genotypes to defined functional categories and corresponding drug dosage guidelines. The star allele framework, first proposed in 1996 for the CYP gene family and later formalized with CPICs establishment in 2010 [1, 2], remains foundational to pharmacogenomics. However, this system has notable limitations. Its dependence on a restricted set of benchmark single nucleotide polymorphisms (SNPs) excludes rare or novel pathogenic variants that can invalidate a star allele call and lead to incorrect dosing recommendations. Furthermore, nearby non-pathogenic variants can interfere with haplotype interpretation, introducing additional risk of misclassification. To address these gaps, we developed PHARMWATCH, a multistep pharmacogenomics workflow for comprehensive variant analysis, allele tracking, and contextual interpretation. PHARMWATCH incorporates two algorithmic safeguards designed to identify genomic alterations that compromise star allele accuracy: (1) de novo germline variant screening using the ACMG-based BIAS-2015 classifier and (2) variant interpretation in context (VIIC) to validate the functional integrity of star allele-defining SNPs [3]. Together, these layers enhance the reliability of pharmacogenomic reporting, enabling safe, automated, and review-ready recommendations that extend beyond the constraints of traditional star allele-based approaches.
Rowlands, C. F.; Choi, S.; Allen, S.; Kuzbari, Z.; Cubuk, C.; Sultana, R.; Torr, B.; Durkie, M.; Burghel, G. J.; Robinson, R.; Callaway, A.; Field, J.; Frugtniet, B.; Palmer-Smith, S.; Grant, J.; Pagan, J.; Johnston, E.; McDevitt, T.; Hughes, L.; Yarram-Smith, L.; Logan, P.; Reed, L.; Snape, K.; McVeigh, T.; Hanson, H.; Garrett, A.; Turnbull, C.; CanVIG-UK,
Show abstract
Interpretation of germline variants in cancer susceptibility genes (CSGs) requires the collation of variant-level data from diverse sources, as well as the assembly of comprehensive clinical data, often necessitating sharing of information between genomic testing centers. Although a number of variant interpretation tools exist, there remains a need for a CSG-focused platform tailored to the diverse range of ClinGen variant curation expert panel guidance in these difficult-to-interpret genes. Here, we describe CanVar-UK, a freely-accessible web platform to assist in the interpretation of germline CSG variants. CanVar-UK contains variant-level data for over 1.7 million single nucleotide variants, comprising all coding variants in 115 established CSGs. These data include: in silico scores from 11 tools of clinical relevance; population allele frequencies from gnomAD v4.1 and case counts from NHS genomic testing via linkage to the National Disease Registration Service; variant-level readouts from 31 different functional and splicing studies across 13 CSGs; genetic epidemiology studies of the BRCA1/2 genes; and live linkage to existing consensus classifications in the ClinVar database. CanVar-UK additionally has a diagnostic discussion forum functionality, via which users are able to email the rest of the user base with queries and/or suggested classifications, facilitating the exchange of clinical and classification data between diagnostic centers. Already widely used by the NHS clinical workforce in the CSG space (with 879 registered NHS users), CanVar-UK has a rapidly growing international user base, with 607 registered users based outside the UK. We believe CanVar-UK to be an invaluable resource for germline CSG variant interpretation.
Schreiner, P. A.; Markianos, K.; Francis, M.; Despard, B.; Gorman, B. R.; Said, I.; Dong, F.; Gautam, S.; Dochtermann, D.; Shi, Y.; Devineni, P.; Kirkpatrick, C.; Khazanov, N.; Moser, J.; Million Veteran Program, ; Huang, G. D.; Muralidhar, S.; Tsao, P. S.; Pyarajan, S.
Show abstract
The Million Veteran Program (MVP) represents the largest and one of the most diverse single cohorts associated with longitudinal Electronic Health Record data (EHR) data. We profiled a subset of samples from MVP using the Illumina Infinium MethylationEPIC Beadchip (EPIC array) to generate one of the largest single cohort methylation dataset to-date. Methylation profiles were analyzed for 45,460 total individuals, with the most populous ancestries composed of 27,455 Europeans, 11,798 African Americans, and 4,859 Admixed Americans. We detail the strict quality control standards implemented to ensure the most robust method of methylation profiling of the MVP cohort. This dataset was then applied to evaluate the effects of smoking exposure on DNA methylation in MVP participants. Ancestry-stratified epigenome-wide association studies (EWAS) of smoking status (ever/never) were performed using over 750,000 probes with certifiable signal. Our multi-ancestry meta-analysis demonstrates replicability with existing EWAS and identifies 3,207 novel probe-smoking associations unlocked via the depth and breadth of data in this cohort.
Li, W.; Bhat, V.; Yu, T.; Lebo, M.; Zitnik, M.; Cassa, C. A.
Show abstract
BackgroundMost rare coding variants in monogenic disease genes remain classified as Variants of Uncertain Significance (VUS), limiting their use in clinical care. Many variant classifications have been submitted to ClinVar, often with rich free-text summaries of the evidence underlying each classification. These narratives are not standardized and are difficult to mine systematically, making it challenging to identify variants that might be reclassified as new evidence becomes available. MethodsWe developed a two-stage language-model pipeline that (i) detects whether functional, population, or computational evidence is described in ClinVar and ClinGen variant summaries, and (ii) classifies whether it is evidence of pathogenicity or benignity. We first constructed Variant Evidence Text Annotations (VETA), a dataset of 44,522 ACMG/AMP keyword-description pairs derived from 18,678 ClinVar and ClinGen variant summaries using an LLM-based consensus annotation procedure. We then fine-tuned BioBERT-large models for each evidence type and stage, and validated performance using independent ClinGen expert-curated summaries as well as orthogonal variant-level evidence, including functional screening, computational scores, and population estimates of disease impact. ResultsAcross evidence types, our models accurately identify whether functional, population, and computational evidence is present and whether it leans toward a pathogenic or benign impact. We find high agreement with ClinGen expert annotations and highly significant separation of validation scores between model-predicted benign and pathogenic groups (functional assays p = 8.13 x 10-30, variant allele frequencies p = 4.11 x 10-22, computational predictions p < 8.88 x 10-16). We applied the full workflow to approximately 6,000 ClinVar VUS variants whose submission summaries lacked explicit functional or population evidence. By aggregating external functional, population, computational, and diagnostic evidence using the ACMG/AMP SVI point-based framework, we found that about 17% of these VUS meet quantitative thresholds for a likely benign or likely pathogenic classification, including 492 VUS in genes reviewed by ClinGen Variant Curation Expert Panels. ConclusionsTransforming unstructured variant summaries into a structured, evidence-type matrix enables scalable detection of evidence gaps, allowing for the systematic integration of new data sources, and prioritization of VUS that are most likely to be reclassified. This language model-enabled pipeline provides a generalizable digital approach to identify clinical evidence gaps as functional screens, biobank resources, and computational predictors continue to evolve.
Glasenapp, M. R.; Yee, M.-C.; Symons, A. E.; Cornejo, O. E.; Garcia, O. A.
Show abstract
Accurate HLA typing is critical for transplantation, pharmacogenomics, and disease risk prediction, yet short-read approaches cannot resolve the HLA region's extreme polymorphism. Long-read sequencing improves resolution, but its adoption has been limited by higher cost, reduced base accuracy, limited throughput, and reliance on long-range PCR. To overcome these limitations, we present a multiplexed long-read hybrid capture workflow for PacBio and Oxford Nanopore sequencing that enriches all classical HLA loci and the complete HLA Class III region. A single-step enzymatic fragmentation and barcoding strategy enables automated library prep. We also introduce HLA-Resolve, an HLA typing program optimized for HiFi reads, and validate workflow performance against the Genome in a Bottle, Human Pangenome Reference Consortium, and International Histocompatibility Working Group benchmarks using 32 geographically diverse samples. These advances offer a cost-effective approach for high-resolution HLA typing with clinical applicability and enable investigation of the role of HLA Class III variation in disease.
Abderrazzaq, H.; Singh, M.; Babb, L.; Bergquist, T.; Brenner, S. E.; Pejaver, V.; O'Donnell-Luria, A.; Radivojac, P.; ClinGen Computational Working Group, ; ClinGen Variant Classification Working Group,
Show abstract
Insertions and deletions (indels) represent a substantial source of genetic variation in humans and are associated with a diverse array of functional consequences. Despite their prevalence and clinical importance, indels, particularly short in-frame indels, remain critically understudied compared to single nucleotide variants and are challenging to interpret clinically. While many computational predictors for missense variants have been rigorously evaluated and calibrated for clinical use, the clinical utility of tools for in-frame indels remains uncertain. To address this gap, we have calibrated in-frame indel prediction tools for clinical variant classification. We constructed a high-confidence dataset of in-frame indel variants ([≤] 50bp) from clinical and population databases and estimated the prior probability of pathogenicity of a rare in-frame indel observed in a disease-associated gene, and of an insertion and deletion separately. Using a previously developed statistical framework based on local posterior probabilities, we then established score thresholds for eight computational tools, corresponding to distinct evidence levels for pathogenic and benign classification according to ACMG/AMP guidelines. All in-frame indel predictors evaluated here reached multiple evidence levels of pathogenicity and/or benignity, demonstrating measurable clinical value. However, these models consistently exhibited lower performance levels compared to missense predictors, highlighting the need for improved computational approaches for indel classification.
Terrazzan, A.; Ancona, P.; Carbone, F. P.; Trevisan, P.; Zuccato, C.; Szymanek, E. A.; Szelag, M.; Brugnoli, F.; Zaczek, A.; Gaj, P.; Swierniak, M.; Calabro, L.; Agnoletto, C.; Palatini, J.; Bianchi, N.; Duchnowska, R.; Senkus, E.; Jazdzewski, K.; Kaminski, T. S.; Volinia, S.
Show abstract
Circulating tumour cells (CTCs) represent a minimally invasive method for monitoring cancer evolution in patients. CTCs are nowadays commonly isolated using antibodies against EPCAM protein. A key limitation regards the extent of EPCAM-negative CTCs, such as those that undergo EMT or whose tumour of origin is EPCAM-low or negative. We studied 3,302 RNA single-cell transcriptomes reported as CTCs in public repositories. Using copy number variation and cell type-specific markers, we discriminated bona fide CTCs from contaminating blood cells, often mislabelled as CTCs. The integration of bona fide CTCs and PBMCs, from multiple datasets, allowed us to identify novel markers, such as CLDN4, CLDN7, EFNA1 and TACSTD2 for epithelial CTCs, KCNK15 and LY6K for epithelial B CTCs, and ITGB4 for both epithelial B and mesenchymal CTCs. We revealed PODXL, AXL, CAV1, and TGM2 as markers of mesenchymal CTCs, which might be undetectable using anti-EPCAM antibodies, and TM4SF1 as universal marker, expressed in all CTC subclasses. Additionally, we found platelets to be physically associated with the epithelial A, but not with the epithelial B or the mesenchymal subtypes. Finally, we developed and implemented CTCeek, the first web-based and public reference tool that automatically annotates bona fide CTCs from scRNA-sequencing profiles.
Arbona, J. S.; Garcia Samartino, C.; Angeloni, A. R.; Vaquer, C. C.; Wetten, P. A.; Bocanegra, V.; Militello, R. D.; Sanguinetti, G.; Correa, A.; Pellegrini, P.; Carlen, M.; Minatti, W. R.; Vaschalde, G. A.; Perez, R.; Manzino, R. N.; Rodriguez, J. D.; Valdemoros, P.; Sarrio, L.; Ledesma, A.; Campoy, E. M.
Show abstract
DNA methylation biomarkers for cancer diagnostics often underperform when tumor and background tissues share epigenetic programs, or when complex specimens with mixed cellular composition dilute tumor-derived signals and increase variability. To address these limitations, we developed a gene-centric, browser-based discovery platform that integrates genome-wide methylomes with matched transcriptomes and reference layers spanning pan-cancer tissues and leukocytes, enabling background-aware filtering beyond binary tumor-normal contrasts. Candidate loci are prioritized using combined thresholds on methylation effect size and intra-group variability to penalize stochastic and heterogeneous variation. In colorectal cancer, methylation-sensitive restriction enzyme quantitative PCR (MSRE-qPCR) validation in independent tissue cohorts confirmed multiple candidate loci with AUCs of 0.81-1.00. Using the same framework, MSRE-qPCR validation distinguished hepatocellular carcinoma from cirrhotic liver, and analysis of public tumor methylomes identified subtype-specific markers in lung adenocarcinoma and squamous-cell carcinoma. This resource bridges genome-scale epigenomic discovery with clinically accessible PCR-based methylation assays.
Boquett, J. A.; Lin, S. Y.-T.; House, J. S.; Ahn, K.; Suseno, R.; BakenRa, A.; Guthrie, K.; Wright, M.; Motsinger-Reif, A.; Maiers, M.; Hollenbach, J. A.
Show abstract
BackgroundVariation in the HLA loci, located on human chromosome 6p, has been associated with hundreds of diseases and conditions. However, high levels of polymorphism that characterize the HLA system, coupled with generally modest effect sizes for most phenotypes, necessitate relatively large sample sizes to power association studies; meanwhile, high resolution HLA genotyping remains relatively resource intensive. These constraints limit identification of novel associations. While phenome-wide association studies (PheWAS) in the context of large registries with available electronic health records (EHR) have revealed new insights into the role of HLA in disease, many common health conditions are poorly represented in EHR due to the temporal nature of their occurrence or general underreporting. Further, these studies have generally been conducted with HLA genotyping data imputed from microarrays, rather than direct measurement of high-resolution genotypes. ObjectiveTo overcome these limitations and reveal novel HLA associations we undertook a PheWAS in many previously understudied health conditions. MethodsWe queried over 300 hundred conditions, diseases and traits from 70,724 subjects registered with NMDP with available high-resolution HLA genotyping (HLA-A, HLA-B, HLA-C, HLA-DRB1, and HLA-DQB1). After stratifying according to ancestry, we performed a logistic regression analysis adjusting for sex and age for HLA-phenotype association. ResultsWe identified 48 significant HLA associations across ancestry groups, confirming several known associations and uncovered fifteen novel associations. Most novel associations pertained to common infectious or allergic phenotypes that often go under-reported in the EHR. Of particular translational importance, we identified a previously undetected yet very strong association between HLA-DRB1*04:01 and sensitivity to cefaclor, a specific class of cephalosporin (OR = 3.74, p-value 5.10E-28). Molecular docking simulations predict cefaclor binding in the P4 pocket of HLA-DRB1*04:01, with substantially greater affinity than non-associated antibiotics, including other cephalosporins. This pharmacogenomic signal highlights an opportunity for risk stratification and targeted prevention of adverse drug reactions. Other novel associations found, such as susceptibility to genital warts (HPV) and allergic rhinitis, reveals new insights into the role of specific HLA alleles in immune-mediated disease. The vast majority of these novel associations were replicated in the independent All of Us cohort, confirming the validity of this approach. ConclusionCollectively, our findings demonstrate the value of integrating population-scale, high-resolution HLA genotypes with phenotyping beyond the EHR to reveal immunogenetic influences on common health outcomes. They also point to immediate translational avenues - particularly for drug hypersensitivity - while motivating future functional studies and prospective clinical validation to refine mechanistic understanding and clinical utility.
Gui, J.; Zhang, M.; Kan, Z.; He, X.; Gao, M.; Han, J.; Wang, Q.; Zhang, S.; Hu, J.; Qin, W.; Bi, Z.; Huang, B.; Wu, Z.; Ran, J.
Show abstract
GlycoRNA, newly identified RNA molecules bearing glycan modifications on cell membranes, is implicated in cell communication and immune regulation. However, current technological limitations impede a thorough elucidation of their biological roles and clinical significance. Here, we developed Nucleotides Hybridization and Aptamer-based Proximity Ligation (NHAPL), a homogeneous assay enabling sensitive and quantitative glycoRNA analysis from 160pg total cell RNA and 1{micro}l serum. NHAPL integrates dual recognition by a sialic acid aptamer and RNA binding probe, followed by ligation and qPCR amplification. We further established multiplexed NHAPL for simultaneous detection of multiple glycoRNA. Using NHAPL, we uncover for the first time that protein-coding mRNAs, specifically 3' untranslated region (3'UTR) fragments of FNDC3B and CTSS, undergo sialic acid-containing N-glycosylation on the cell surface. These glycoRNAs functionally promote monocyte adhesion to endothelial cells and hepatoma cell migration, revealing a direct role in cell-cell interactions and cancer-related phenotypes. Applying multiplexed NHAPL to human serum, we identify glycoRNA signatures highly specific to systemic lupus erythematosus (SLE). In particular, glycoY5 and glycoU1 achieve near-complete discrimination between patients and healthy controls (area under the curve (AUC) = 1.00 and 0.9977), whereas conventional total RNA analysis fails to capture these differences, highlighting RNA glycosylation modification as a distinct regulatory layer. Its simplicity and flexibility make it well suited for clinical glycoRNA profiling and biomarker discovery. Overall, NHAPL represents a robust and versatile platform for advancing glycoRNA research and diagnostic development.
Cherchi, I.; Orlando, F.; Quaini, O.; Paoli, M.; Ciani, Y.; Demichelis, F.
Show abstract
1The T2T-CHM13v2.0 reference genome added previously uncharacterized genomic sequences and improved the accuracy of repetitive stretches compared to former human genome assemblies. By comprehensive allelic variation analysis and read mapping statistics from sequencing reads aligned to hg38 and T2T-CHM13 assemblies in samples encompassing different sequencing designs and ethnicity groups, we observed that T2T-CHM13v2.0 assembly significantly reduces the reference mapping bias (RMB) and increases read mapping precision at clinically relevant sites, including BRCA1 pathogenic variants. Further, we report the presence of sequence dissimilarities among reference genomes in the proximity of ClinVar annotated variants, suggesting the need for data re-analysis and potential redesign of probes targeting clinically relevant regions. Overall, these findings support the implementation of T2T-CHM13 reference for the improvement of sequencing data analyses in the clinical genomic setting.
Gonzalez, E.; Villaman, C.; Rebolledo-Jaramillo, B.; Hernandez, C. F.; Bustos, B. I.; Berrios, D.; Moreno, G.; Posey, J. E.; Lupski, J. R.; Poli, C.; Calderon, J. F.; Munoz-Venturelli, P.; Fernandez, M. I.; Lecaros, J. A.; Armisen, R.; Repetto, G. M.; Perez-Palma, E.
Show abstract
BackgroundGenetic studies have disproportionately focused on populations of European ancestry, limiting the generalizability of allele-frequency references and genetic associations to underrepresented groups, including South American populations. This gap is particularly relevant for rare diseases and cancer, where accurate variant interpretation depends in part on appropriate population context. In addition, population-specific haplotype structure influence genome-wide association analyses and the portability of polygenic scores across ancestries. By focusing on the Chilean population, our work aims to bridge these gaps, providing more accurate reference data for genetic research and enhancing diagnostic and therapeutic strategies for underrepresented communities. MethodsWe aggregated exomes from seven Chilean cohorts and processed samples using a standardized best practices workflow, including population-scale quality control and relatedness filtering. Aggregated variants were annotated with VEP (including LOFTEE, REVEL, AlphaMissense, EVE, and CADD) and clinically classified using InterVar. Population structure was assessed with PCA/admixture. We compared overlap and allele frequencies against external references, performed gene-level variant burden analysis, and constructed phased haplotypes for ancestry association using linear regression with Bonferroni correction. Finally, we developed a dashboard with R shiny framework to enable gene-wise exploration of annotated variants. ResultsWe present CHANGER (Chilean Aggregated National Genomics Resource), a comprehensive aggregation of 902 unrelated Chilean exomes designed to create a detailed reference of genetic variation in Chile. By incorporating data from multiple cohorts, we identified 774,110 unique genetic variants, with an average of 42,601 aggregated variants per individual. We identified 132,363 novel variants, of which 31,470 were common within our cohort (frequency > 1 %). We provide variant annotation and direct comparisons with other publicly available general population references. Beyond variant discovery, CHANGER supports gene-level burden scans (61 Missense-Damaging depleted genes; 2 Loss-of-Function enriched genes) and a haplotype resource for ancestry association (51,171 haplotypes), enabling downstream interpretation, improved imputation, and more powerful genome-wide association analyses in Chileans. ConclusionsCHANGER provides a valuable resource for genetic research and a reference for local variant interpretation. Importantly, CHANGER follows a high-quality methodological baseline and an open, FAIR-aligned infrastructure designed to grow, increasing its value for discovery, imputation, and equitable clinical interpretation over time.
Park, S.; Seo, M.; Park, C. H.; Park, H.-Y.; Kim, Y. J.; Kim, B.-J.
Show abstract
Pharmacogenomics is an essential component of precision medicine; however, most existing knowledge has been derived from populations of European ancestry, limiting the understanding of pharmacogenomic diversity in East Asian populations. In this study, we applied genotype imputation to the Korea Biobank Array v2.0 using a reference panel of 8,062 Korean whole-genome sequencing (WGS) samples and analyzed pharmacogenomic variants and phenotypes in 14,490 Korean individuals. To assess the accuracy of imputation-based variant detection, we compared imputed genotypes with matched WGS data from 735 individuals and with genotypes obtained from the commercial PangenomiX Plus Array Kit for an additional 137 individuals, demonstrating high concordance. When extended to the full cohort, all individuals were found to carry at least one pharmacogenomic variant, with high frequencies observed in key pharmacogenes including CYP2C19, SLCO1B1, CYP3A5, and VKORC1. Phenotype distributions were broadly consistent with previous WGS-based studies in East Asians but showed notable differences compared with European populations. Overall, this population-specific, large-scale analysis provides a comprehensive pharmacogenomic landscape in Koreans and highlights the importance of ancestry-tailored data for equitable precision medicine.
Chang, H.-C.; Shi, Y.; Cheng, H.; Zou, J.; Chang, A. C.-C.; Schlegel, B. T.; Wang, W.; Brown, D. D.; Chen, F.; Wang, S.; Li, D.; Sai, R.; Michel, N.; Oesterreich, S.; Lee, A. V.; Tseng, G. C.
Show abstract
Accurately inferring copy number variation (CNV) from scRNA-seq data is critical for identifying malignant cells, reconstructing tumor subclonal architecture, and uncovering the genomic drivers that dictate cancer cell biology. However, the performance of existing tools varies significantly, and current benchmarks lack the breadth of datasets and methods necessary to provide definitive guidance. We present a comprehensive benchmark of 12 CNV inference methods across 28 real datasets (>100,000 cells) and diverse synthetic datasets. By evaluating methods based on malignant cell classification accuracy, CNV inference accuracy, scalability, and robustness, we establish a definitive practitioners guideline: allele-aware methods like Numbat excel when high-quality allelic inference can be achieved, whereas expression-centric tools such as Clonalscope, CopyKAT, inferCNV, and SCEVAN remain reliable when raw sequencing data are unavailable. Our study provides both a practical decision-making framework for researchers and a public repository of standardized CNV profiles to catalyze further methodological innovation.
Meng, M.; Liu, L.; Du, Q.; Zhou, X.; Tian, Y.; Sun, K.; Li, N.; Zhang, P.; Lian, X.; Fan, N.; Zhu, N.; Li, S.; Mao, A.; Li, Y.; Zou, G.
Show abstract
Background: Artificial intelligence (AI)-driven variant prioritization has demonstrated substantial utility in expediting genetic diagnosis by ranking the most likely causative variants. While a variety of tools have been developed, few address the unique clinical and technical constraints in prenatal genetic diagnosis. Methods: We introduce Berrylyzer, a novel, end-to-end variant prioritization system applied to prenatal diagnosis.Inspired by clinician's reasoning process during variant interpretation, Berrylyzer applies a modular, stepwise scoring architecture that jointly integrates phenotypic and genomic evidence and delivers a ranked list of candidate variants, achieving high computational efficiency without compromising analytical rigor. Moreover, Berrylyzer natively supports both structured ontologies and free-text clinical narratives, enabling flexible integration into diverse clinical environments. Its performance was rigorously evaluated across two independent, real-world prenatal cohorts and benchmarked against three state-of-the-art methods: Xrare, Exomiser, and PhenIX. Results: Across the two datasets, Berrylyzer ranked 56.41% and 58.12% of diagnostic variants first, and achieved recall rates of 94.02% and 97.42% within top 20, respectively. Berrylyzer outperformed Xrare (85.19% and 87.08%), Exomiser (84.90% and 85.98%), and PhenIX (82.05% and 88.93%). Stratified analysis consistently demonstrated superior performance across diverse disease categories, inheritance patterns, and analytical strategies. Notably, Berrylyzer exhibited robustness regardless of phenotype forms, yielding comparable top 20 recall rates for free-text descriptions and standardized terminologies. Conclusion: Berrylyzer represents an accurate, interpretable, and computationally lightweight variant prioritization system for prenatal genetic diagnosis. The superior performance across heterogeneous diagnostic contexts enables it as a practical solution for seamless integration into clinical pipelines, thereby advancing precision medicine in prenatal settings.
Behera, S.; Rossi, M.; Wang, Y.; Izydorczyk, M. B.; Tran, D.; Dalgard, C. L.; Kalef-Ezra, E.; Kottapalli, K.; Mehta, H.; Parnaby, G.; Risse-Adams, O. S.; Scholz, S. W.; Shen, H.; Nelson, T. M.; Visvanath, A.; Zheng, X.; Doddapaneni, H.; Garcia, T. X.; Mason, C. E.; Proukakis, C.; Han, J.; Mehio, R.; Catreux, S.; Sedlazeck, F.
Show abstract
Detecting low variant allele fraction (VAF) mosaic variants without matching controls remains a major challenge in genomics, limited by technical noise, lack of benchmarks, and computational scalability. We present the DRAGEN mosaic caller, a hardware-accelerated approach identifying variants down to [~]1-2% VAF with low false-positive rates and hour-scale runtimes for mosaic SNV/indel detection from bulk sequencing. To support evaluation, we introduce a genome-wide low-VAF benchmark for variants between 1-10% VAF. Application to blood, sperm, and brain tissues revealed patterns, including mosaic hotspots and mutational signatures. The first analysis of HG002 blood showed that many "mosaic" variants defined from HG002 cell lines are likely culture-derived and not in vivo mutations. Importantly, DRAGEN also enables personalized assembly pangenome references to improve alignment and mosaic variant detection in complex regions. Together, this development makes routine low-VAF discovery feasible, opening new opportunities to study mosaic mutations in healthy and disease individuals.